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(57) Abstract 

A fault handling process 
in a computer system subject 
to CPU design errors and 
functioning under an operating 
system (OS) having an 
integral fault handling module 
includes the steps of: setting 
an intercept flag when a 
central processor fault occurs 
if the fault is to be directed to 
a preprocessor; establishing a 
safestore frame which includes 
information identifying the 
type of fault and whether 
the intercept flag is set; and 
transferring control to the 
OS fault handling module; 
then in the OS fault handling 
module, determining whether 
the intercept flag is set; if 
the intercept flag is not set, 
handling the fault in the OS 
fault module; if the intercept 
flag is set, transferring control 
from the OS fault module to 
an Intercept Process written 
in machine language; and 

handling the fault in the Intercept Process. This renders the resolution of faults due to correctable CPU design errors independent of the 
OS employed at a given installation and customizable to a given system without the need to revise the OS fault modules for each OS. As 
each such design error is worked out (e.g,, by installing a substitute integrated circuit in which the error has been corrected), the Intercept 
Process (and CPU firmware) can be modified to remove monitoring and handling for faults due to the corrected error. 
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1 FAULT INTERCEPT AND RESOLUTION PROCESS 

2 INDEPENDENT OF OPERATING SYSTEM 

3 Cross Reference to Related Provisional Application 

4 This application claims the benefit of the filing date of U.S. Provisional 

5 Patent Application Sexial No. 60/032,442 filed December 3, 1996, entitled 

6 INTERCEPT PROCESS by Sidney L.Andress. 

7 HeldofthelnvendoT^ 

8 This invention relates to campata cental processors and, more 

9 particularly, to the repetitive temporary storage of central processing register 

10 contents and supporting infoimation in a safestore in order to facilitate 

11 recovery from a fault or transfer to another domain. Still more particularly, tiiis 

12 invention relates to a safestore feature which intercepts cotain faults resulting 

13 from known system design errors and diverts die resolution process for 

14 handling such faults from the operating system friult handling fadli^ to a 
,15 special purpose software fault handling facility. 

16 Background of the Invention 

17 As personal conq>uters and workstations have become more and more 

18 powerful, makers of mainframf. computers have undertaken to provide features 

19 which cannot readily be matched by tiiese smaller machines in order to stay 

1 
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1 viable in die maiketplace* One sudi feature may be broadly le&rred to as 

2 tolerance which means die ability to widistand and promptly recover fiom 

3 hardware faults and other faults without the loss of cradal information. The 

4 central processing units (CPUs) of mainframe computers typically have ecror 

5 and fault detection circuitry, and sometimes error recovery drcuitcyt built in at 

6 numerous ioformadon transfer points in the logic to detect and characterize any 

7 fault which might occur. 

8 The CPU(s) of a given mainframe computer comprises many registers 

9 logically interconnected to achieve the ability to execute the repertoire of 

10 instructions characteristic of the CPU(s). In this enviroimient, the achievement 

11 of genuinely fault tolerant operation, in which recovery from a detected finilt 

12 can be instituted at a point in a program immediately preceding the faulting 

13 instruction/operation, requires that one or more recent copies of all the 

14 software visible registers (and supporting information also subject to diange) 

15 must be maintained and constantiy updated This procedure is typically carried 

16 out by rdteratively sending copies of die registers and supporting information 

17 (safestore information) to a special, dedicated memory or memory sectiorL 

18 When a fault occurs and analysis determines that recovery is possible, 

19 the safestore information is used to reestablish the software visible registers in 
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1 die CPU with the contents held lecendy before 

2 can be instituted or tded from the conesponding place in program execution. 

3 The logical design of modem CPUs, pardcularly mainframes, is 

4 enomiously complex. Ihevitably» logic design errors are piesrat as die design 

5 process proceeds. If the specific hardware in which a design error is 

6 discovered is still in development, it can simply be coirecte4 sometimes with 

7 appropriate changes in firmware. Howew, if the faulting condition occurs so 

8 rarely and is so elusive that it is only discovered after systems have been 

9 installed for commercial and/or other field operation, the correction of the 

10 hardware/firmware (for example, by replacing an integrated circuit having die 

11 design error with one in which the error has been corrected) can be time 

12 consuming. Similarly, if a rarely occurring hardware fault is discovered during 

13 development, there may be good reason, such as meeting delivery schedules, to 

14 forego any immediate attempt to effect a definitive hardware/firmware 

15 correction. In both instances, a conventional, and generally effective, prior art 

16 approach has been to set up die CPU firmware to detect and refer faults to a 

17 fault processing module written into the operating systenL 

18 There are, however, drawbacks to this approach. When design errors 

19 are discovered^ the resolution process for the resulting fault must be 
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1 incoiporated into the fault handling module operating system itself. This can 

2 be not only a fomiidable task, but the revisions to the operating system in all 

3 the systems in existence can be dismptive of nomial operation. Further, some 

4 mainftams CPUs are configured to run under a plurality of operating systems. 

5 This requires changes to the fault processing modules of each operating system 

6 which can be accommodated by the CPUs. Still further, certain system design 

7 errors are often worked out, even after commercialization, as very large scale 

8 integrated circuits are modified and the chips changed out ia individual 

9 installations. As a result, a feature in the operating system(s) introduced to 

10 handle a problem which no longer exists may adversely affect performance and 

11 certainly increases the amount of code in the operatmg system. It is to the 

12 solution of tfiese related probl^a:is that the present invention is directed 

13 Objects of the Invention 

14 It is therefore a broad object of this invention to provide, in a central 

15 processor, fault tolerant operation in which the storage and recovery of 

16 safestore information to handle certain faults is handled independent of the 

17 operating system. 

18 It is a more specific object of this invention to provide a fault tolerant 

19 CPU in which the fault recovery process for certain predetermined faults is 



4 



wo 98/25222 PCT/US97/22185 

1 diverted from the operating system fault processing module to an indq)endeat 

2 &cility inylemented in software written in Tnarhmi* specific language. 

3 Summary of the Invention 

4 Briefly, tiiese and other objects of the invention are achieved, in a &ult 

5 tolerant central processing unit having data manipulation circuitry inrlnHing a 

6 plurality of software visible registers, by providing a safestore memory for 

7 storing the contents of the plurality of software visible registers, after a data 

8 manipulation operation, in order to facilitate restart after a detected fault by 

9 transferring the corresponding contents of the safestore memory back to the 

10 software visible registers during recovery from the detected fault More 

11 particularly, the subject process is enq)loyed in a computer system functioning 

12 under an operating system (OS) having an integral fault handling module and 

13 includes the steps of: setting an intercept flag when a central processor fault 

14 occurs if the fault is to be directed to a preprocessor; establishing a safestore 

15 frame which includes information identifying the type of fault and whether the 

16 intercept flag is set; and transferring control to the OS fault handling module; 

17 then in tiie OS &ult handling module, determining whether the intercept flag is 

18 set; if the intercept flag is not set, handling die fault in the OS fault module; if 

19 the intercept flag is set, transferring control from the OS fruilt module to an 
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1 Intercept Process written in machi™^ language; and handling the fault in the 

2 Intercept Process. 

3 This renders the resolution of faults due to correctable CPU design 

4 errors independent of the OS enq)loyed at a given installadon and customizable 

5 to a given system without the need to revise the OS fault modules for each OS. 

6 As each such design error is worked out (eg., by installing a substitute 

7 integrated circuit in which the error has been corrected), the Intercept Process 

8 (and CPU firmware) can be modified to remove monitoring and handling for 

9 faults due to the corrected error. 

10 DescriPtioD of the Drawing 

11 The subject matter of the invention is particularly pointed out and 

12 distinctiy claimed in the concluding portion of the specification. The invention, 

13 however, both as to organization and method of operation, may best be 

14 understood by reference to the following description taken in conjunction witii 

15 the subjoined claims and the accompanying drawing of which: 

16 HG. 1 is a high level block diagram of an multiprocessor conq)uter 

17 system which is an exenq>lary environment for practicing the invention; 

18 HG. 2 is a slightly lower level block diagram showing additional details 

19 of an exemplary CPU board in tiie multiprocessor system of FIG. 1; 
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1 HG. 3 is a block diagram showing additional details of a basic 

2 processing unit including within each CPU on die CPU board shown in FIG. 2; 

3 FIG. 4 is a revised block diagram of the basic processing unit 

4 particularly showing the relationship of an auxiliary random access memory to 

5 die basic prcx^ssing unit and its software visible registers (and siq)porting 

6 information) in accordance with the subject invention, the random access 

7 memory storing, inter alia, a Safestore Frame; 

8 FIG* 5 is a system design flow chare of the intercept Process used in 

9 handling faults in accordance with the invention; and 

10 FIG. 6 is a process flow diagram illustrating die Intercq)t Process and 

11 die cooperation between CPU hardware, CPU firmware, the operating system 

12 and the subject Intercept Process in handling faults. 

13 Description of the Preferred EmbodimeDt(s^ 

14 Attention is first directed to HG. 1 which is a high level block diagram 

15 of an exemplary multiprocessor computer system incorporating the invention. 

16 A first CPU board (CPU Board ''O'*) 1 inchides four central processor units 2 

17 (CPU •^"). 4 (CPU "D, 6 (CPU "2"), 8 (CPU "3"). Each of die central 

18 processor umts 2, 4, 6, 8 situated on die first CPU board 1 includes an integral 

19 private cache memory module, 3* S, 7» 9, respectively. The private cache 
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1 modules 3, S, 7» 9 aie each configtued as "^stoie into"*; i.e.^ the xesults of each 

2 completed operation perfomied in the CPU are stored into die pnvate cache. 

3 Each of die pnvate cache modules 3, 5, 7, 9 on CPU Board *V 1 interfiice 

4 with a CPU bus 10 for direct communicadons between the CPUs 2» 4» 6, 8. 

5 In the exemplary system, there are three additional CPU boards 12 

6 (CPU Board 13 (CPU Board "2") and 14 (CPU Board "3"). each of 

7 which is substantially identical to CPU board 1 although diose skilled in the 

8 multiprocessor art will understand that each CPU board and each CPU on each 

9 of the boards is assigned a unique identification number to facilitate 

10 communication and cooperation among the CPUs in die system. 

11 CPU board 1 (i.e., CPU Board also includes a shared cache 11 

12 disposed between CbridgiDg") die CPU bus 10 and a system bus 15, It will be 

13 understood diat each of die CPU boards 12, 13, 14 also each includes a CP\J 

14 bus and a shared cache, identically oriented 

15 A system control unit 16 serves to couple the system bus 15 to a main 

16 memory unit 17 via a memory bus 18. (It will be noted diat die main memory 

17 iim( 18 includes a Reserved Memory Space 50 - RMS - which will be 

18 discussed further below.) In addition, one or more input/output units 19 

19 interface the system bus 15 with various input/output subsystems, not shown, 

8 
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1 to achieve input/output funcdons on a system basis, all as well known to those 

2 skilled ia the ait Similaiiy, other subsystems 20, not otherwise specified or 

3 shown, may be connected to the system bus 15 to conq)lete a given 

4 multiprocessor systenL System control unit 16 also convendonaUy provides a 

5 muld-phase clock to all the system units lequinng a common clock source. A 

6 service processor 21, typically a commercial personal computer or 

7 workstation, serves not only as a system and maintenance console, but also is 

8 used to boot the system and is en^)loyed extensively in analyzing and 

9 processing faults. 

10 HG. 2 is a sUghtly lower level block diagram of CPU **0" 2 of CPU 

11 board 1 (CPU Board '"0'*) illustrating additional structure which is present in 

12 each CPU m die system. CPU "0" 2 includes a basic processing unit 22 and 

13 support circuitry 23 therefor. 

14 As previously described, CPU **0" 2 also includes private cache module 

15 '"O"" 3 which constitutes a cache control unit 24 and a private cache 25 (which 

16 itself includes additional logic to be described below). Cache control unit 24 

17 includes paging unit 26, cache management unit 27 and CPU bus unit 28. 

18 Fagmg unit 26 interfaces with basic processing unit '"0** 22 and cache 

19 management unit 27. Cache management unit 27 also inter&ces with private 



9 



wo 98/25222 PCT/US97/22185 

1 cache memoxy 25 and CPU bus unit 28. CPU bus unit also istex&ces with 

2 CPU bus 10 and, via CPU bus 10, shared cache 11. Private cache 25 is also 

3 coupled directly to receive infoimation from and send infbimation to the CPU 

4 bus 10 and to receive information fiom and send information to basic 

5 processing unit **(r 72. 

6 As previously described, shared cache 11 also interfaces with system 

7 bus 15 and, via system bus 15, with system control unit 16 and other 

8 systems/subsystems shown in FIG. 1. Main memory 17, mcludmg Reserve 

9 Memory Space 50, may be accessed via the system control imit 16 and 

10 memory bus 18. 

11 It will be seen tihat diere are numerous paths for information fbw among 

12 the various blocks shown in FIGs. 1 and 2. The types of information may 

13 include control, address, instructions and operands. A given CPU may directly 

14 access its own private cache module and indirecdy access the private cache 

15 modules incorporated into die other CPUs on a shared CPU board. Thus, CPU 

16 2 can access, via the CPU bus 10, the shared cache 11 it shares with CPU 

17 4, CPU 6 and CPU "3*^ 8. CPU "O** 2 can also, under defined 

18 conditions, access die private cache module of CPU "2** 6 (for exan5)le) via 

19 the CPU bus 10 to effect a local "siphon'\ Further, CPU '^O'' 2 can access (via 

10 
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1 CPU bus 10, shared cache 11 and system bus 15) the shared caches (not 

2 shown) on each of CPU Board T 12, CPU Board •^2'' 13 and CPU Board •3'* 

3 14. SdU further, a given CPU may indirecdy access the private cache modules 

4 (not shown) of a CPU (not shown) on another CPU board; e.g., CPU "0" on 

5 CPU board 1 (CPU Board ^'*) may, under defined conditions, access the 

6 private cache module of any one of the CPUs on CPU Board **2'' 13 (FIG, 1) 

7 via CPU bus 10, shared cache 11, system bus IS and the shared cache on CPU 

8 Board **2'' to effect a remote "siphon". 

9 Further yet, for example, CPU "0'* 2 can access main memoiy 17, 

10 including RMS SO, via CPU bus 10, shared cache 1 1, system bus IS, SCU 16 

11 and memory bus 18. Still further, for exanq>le, CPU ^X)'* 2 can access, via 

12 CPU bus 10, shared cache 11 and system bus IS, any other block shown 

13 coupled to the system bus IS in FIG. 1 to achieve bilateral communication with 

14 input/ou^ut devices, other subsystem conq)onents and even other 

15 multiprocessor systems. 

16 FIG. 3 is a block diagram which includes additional details of a basic 

17 processing unit 22 in a system incorporating the present invention. The 

18 Address and Execution (AX) unit 30 is a microprocessing engine which 

19 performs all address preparation and executes all instractions except decimal 

11 
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1 antfamedc, binaiy floating point and multiply/divide instnicdons. The main 

2 functions perfonned by the- AX unit 30 include: effective and virtual address 

3 fomiation; memory access control; security checks; register changeAise 

4 control; execution of basic instructions, shift instructions, security instaictions, 

5 character manipulation and misceUaneous instructions; and CLIMB safestore 

6 file. 

7 Efficient scientific calculation capability is inqilemented in the Floating 

8 Point (FP) coprocessor unit 34. The FP unit 34 executes all binary floating 

9 point arithmetic. This unit, operating in concert with the AX unit 30, peifomis 

10 scalar or vector scientific processing. The FP unit 34: executes all binary and 

11 fixed and floating point multiply and divide operations; conq)utes 12 by 72*bit 

12 partial products in one machine cycle; computes eight quotient bits per divide 

13 cycle; performs modulo 15 residue integrity checks; executes all floating point 

14 mantissa arithmetic; executes all exponent operations in either binary or 

15 hexadecimal format; preprocesses operands and post-processes results for 

16 multiply and divide instmctions; and provides indicator and status control. 

17 The DN unit 32 performs the execution of decimal numeric Extended 

18 Instruction Set (EIS) instructions. It also executes Dedmal-to-Binary (DTB), 

19 Binary-to-Dedmal (BTD) conversion EIS instructions and Move-Numeric-Edit 

12 
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1 EIS instructions in conjiinction widi die AX unit 30. Hie DN unit 

2 botii receives operands from and sends results to the private cache 3. A 

3 COMTO ("command toT) bus 38 and a COMFROM (''command from") bus 

4 36 couple togetfao: die AX unit 30, die DN unit 32 and the FP unit 34 for 

5 certain interrelated operations. 

6 The AX unit 30 includes an auxiliary random access memory 40 which 

7 is used to store safestore (and other) information. Thus, the contents of tlie 

8 auxiliary RAM 40 are constandy updated with, for example, duplicates of the 

9 contents of software visible legistras and other relevant information subject to 

10 change (collectively, the Safestore Rame or SSF) such that, in the event of die 

11 occuEtence of a fault from which recovery has been determined to be possible, 

12 processing may be restarted at a point just prior to the £ault by transferdng the 

13 most recent register set stored in die auxiliary RAM 40 back to reestablish die 

14 register set 

15 The straightforward use of a safestore is known in the prior art as 

16 exe^^)lified by U.S. Patent 5,276,862, entided SAFESTORE FRAME 

17 IMPLEMENTATION IN A CENTRAL PROCESSOR by LoweU D. 

18 McCulIey et al; U.S. Patent 5453,232, entided AUTOMATED SAFESTORE 

19 STACK GENERATION AND MOVE IN A FAULT TOLERANT CENTRAL 

13 
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1 PROCESSOR by John E. WiSute et al; and U.S. Patent 5^57.737 entitled 

2 AUTOMATED SAFESTORE STACK GENERATION AND RECOVERY 

3 IN A FAULT TOLERANT CENTRAL PROCESSOR by John E. Wilhite et 

4 al, all incoipotated by le&rence herein for tiieir disclosure of the lepetidve 

5 storage of safestore information in a safestore memoiy and die use of safestore 

6 infonnation in recovery from a fault 

7 As previously noted, the AX unit 30, DN unit 32 and FP unit 34 are, 

8 collectively, referred to as die basic processing unit (BPU) 22. Refening now 

9 to FIG. 4, it will be understood that die AX unit 30, (except for the auxiliary 

10 RAM 40), DN unit 32 and FP unit 34 and tiieir support drcuitry 23 (HG. 2) 

11 are rqnesented by the data manipulation logic block 42 in order that the 

12 auxiliary RAM 40 can be discussed in greater detail in the following discussion 

13 of die invention. 

14 intercept Process provides a fault preprocessor that can review £uUt 

15 situations and provide machine assembly language level ?i!t.<;i« ;f ?^nrc in managing 

16 system design problems. The provision of a fault handling module 

17 incorporated direcdy in a mainfcame operating system to handle known design 

18 errors is well known and effectively permits full functionality operation of a 

19 ^stem until a new hardware release is deliveied, at which tirrv. the fuilt 

14 
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1 handling module sections for handling the design errors which have been 

2 collected can be disabled. However, there are conditions under which tfiis 

3 basic approach has drawbacks. Hrst, each fault correction routine in a &ult 

4 handling mndiile integral with an operating system remains until a new version 

5 of the operating system (a relatively rare event) even though the fault may have 

6 been corrected, for example, by substitution of an updated integrated circuit for 

7 one having the design &ult Second, die fault handling Polity is not operating 

8 system independent for those machines which can run more that one operating 

9 system 

10 In accordance with the present invention, the prq)rocessor is invoked 

11 via CPU hardware/firmware to allow a common TmrhiTift assembly language 

12 routine, stored in RMS 50, to function with a plurality of operating systems. 

13 (It may be noted that this technique is also useful in processing Service 

14 Processor related tasks.) A system design flow chart of tiie intercq)t Process 

15 as adapted to the present invention is shown in FIG. 5. 

16 Thus, as shown in FIG. 5, if a hardware design ator is discovered, the 

17 CPU firmware is modified to set an Intercept flag Qn the cxsanplo, bit 7 of 

18 word 5 in the Safestore Frame) whenever the design error causes a fault The 

19 hitercqpt Process code is built or modified to process the fault caused by the 

15 
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1 design etror. If and when tfie design enor is later conected and an integrated 

2 dicuit chip having the design enor has been replaced by a chip in which the 

3 enor has been corrected, the CPU finnwaie is modified to eliminate ciieddng 

4 for the design fault If desired* the fault handling code for the fault can be 

5 removed from the Intercept Process code stored in RMS. 

6 In accordance with the invention, the intercepting process is carried out 

7 using operating system software to detect hardware indicators requesting 

8 transfer to the special purpose Intercept Process machine assembly language 

9 code which is stored in RMS 50 by the Service Processor 21 during system 
10 initializatioa As shown in FIG. 6, Intercept Process can^ independent of die 
U operating system in use, take corrective action then retom to the faulting 

12 process or pass the fault back to the operating system fault module if no 

13 defined action for Intercept Process is detected. (It may be noted that die 

14 Intercept Process can also initiate certain tasks for the Service Processor, dira 

15 return to the faulting process although this is not shown in HG. 6 as tiiis 

16 feature is not a part of the subject invention.) 

17 Specific faults to be intercepted are established during the initialization 

18 of the CPU witilin its firmware. Tbus, the Intercept flag can be set on any fault 

19 as determined to be necessary by the current processor firmware. 

16 
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1 As previously described, the Intercept Process is a machiDe assembly 

2 li^figiiagft level process that is loaded into RMS 50 upon system initialization. 

3 By being resident in RMS and implemented in machine assembly language, the 

4 Intercept Process is functional with all operating systems which run on a given 

5 hardware system. The Intercept Recess is configured such that any fault type 

6 can cause the CPU firmware to invoke it This is done by setting a dedicated 

7 Intercept Fault flag in the 'fault type" word of die SSF. Therefore, an 

8 Intercept Fault can **piggy-back" on a system level fault 

9 The Intercept Fault flag has priority over all system level faults. When 

10 the operating system detects the presence of the Intercept Fault flag, it transfers 

11 control to the Intercept Process before any processing of the current fault is 

12 performed. This will pemiit any corrective action of which the Intercept 

13 Process is capable to occur before an undesired recovery action is taken by the 

14 operating system's fault module. 

15 The Litercept Process is customized for each release/version of the CPU 

16 firmware. This feature provides the ability for each release of processor 

17 firmware to specify the revision(s) to Intercept Process. This is a substantive 

18 inqnrovement over die prior art in which the operating system fault module had 

19 to carry fixes for all known design errors which ever existed in the C3PU design 
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1 (at least since die previous operating system release) because the operating 

2 system £uilt module could not ^<»**»rm^ni» the CPU fimiwaie revision. 

3 This process provides tiie abili^ to use CPU (under one or more 

4 proprietary operating systems) with known design errors and to quickly resolve 

5 customer problems while the hardware change is being developed The 

6 Intercept Process is tied to the release of processor firmware. When the 

7 Service Processor loads the selected processor firmware during initialization, it 

8 also places the corresponding Intercept Process into Reserved Memory; thus, 

9 the firmware version and the Intercept Process version must match. 

10 As previously noted, the Ihtacept Process resides in RMS 50 in main 

11 memory 17 starting on a page boundary and, in the Gxamplo, is sixteen 

12 consecutive pages in size. These are real pages of memory defined by the 

13 Service Processor 21 when die Intercept Process is loaded during system 

14 initialization. The location of the Intercept Process in RMS is defined in a 

15 predetermined word of the system configuration area of RMS. 

16 Preferably, the ms^ping of the real pages in RMS storing Intercept 

17 Process code is into the same working space as die operating system's &ult 

18 module. Otherwise, the Intercept Process would be required to correct for 

19 problems in all working spaces in the system, and die changmg of the working 

18 
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1 space registers' contents would cause additional wodc for die operating system 

2 iq)on return from the Intercqpt Process. 

3 The Intercept Process requires only limited use of the system registers to 

4 perform its function in oxda to correspondingly limit the irapzct upon die 

5 operating system when die Intercept Process is invoked. The following defines 

6 the descxiptor and pointer registers used by the Intercept Process: 

7 DO s Return descxiptor widi pointer register having die location widiin 

8 die operating system' s fault module. 

9 D3 = Frames the SSF entry diat has the Intercept Process flag set The 

10 Intercept Process will look at data from die SSF to determine die 

11 corrective action required. 

12 D4 = The lostruction Segment Register of the failed process. (Tliere 

13 may be cases where the Intercept Process will have to gamine the 

14 failing code as part of the recovery routine.) 

15 In the example, the Intercq)t Process will transfer dirough 'Tointer 

16 Register 0** to return to die operating system fault module. An ''inter-segment 

17 transfer^ is used to retum to the operating system fault module to avoid 

18 affecting die Safestore Frame stack. As shown in FIG. 6, if there is no process 

19 for handling a given design frudt resident in Int^cept Process, a Transfer 
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1 through Pointer Register 0 by Litercept Process causes the operating system 

2 fault module to try a restart of the &ultmg process invoke a retry of die 

3 failed instruction). (If it faults again, die problem may be referred to die 

4 Service Processor, and a CPU freeze could result) A Transfer dirough Pointer 

5 Register 0-plus-one by iatercept Process causes the operating system fiudt 

6 module to process the frailt defined in the SSF entry. 

7 Assuming that a patch for a given fault is resident in the current 

8 Intercept Process version, the Intercept Process uses ""Descriptor Register 3'* 

9 to access the faulting process SSF. Refeixing briefly to FIG. 4, to analyze die 
10 reason for die request to the Intercept Process and to handle die fault, die 



11 


following data in ih& SSF is 


needed: 


12 


Hnnwace Address 


(SSF Word 1) 


13 


IC of Fault 


(SSF Word 3 or 4) 


14 


Fault Flags and Code 


(SSF Word 5) 


15 


ISR type (ns/ei) 


(SSF Word 8) 


16 


Index (X) Registers 


(SSF Words 4(M3 


17 


A and Q Registers 


(SSF Words 44-45) 


18 


Desaiptors/Pointers 


(SSF Words 48-53) 
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1 The A and Q Registers axe the CFU^s accumulator and siq)plementaiy 

2 accunuilatorxegistexstiespectively. Other SSFinfionnation may be referenced. 

3 The Intercept Process, which modifies only limited information within the SSF, 

4 uses ''Descriptor Register AT to access the faulting process instrucdon stream. 

5 To understand the reason why the fault occurred requires an analysis of tihe 

6 instrucdon stream for conditions that can occur with the hardware signature 

7 value. ThelnterceptProcess win not modify any instruction within the streaia 

8 The Intercept Process, as necessary in the example, uses the A, Q, XI, 

9 X4 and X5 registers to perform its analysis of the intercepted fault All 

10 program visible registers will be saved before diey are used by the Intercept 

11 Process, and the contents of these registers will be restored to their original 

12 values before exiting the Intercept Process. 

13 When any of the defined CPU faults occur, they proceed dirough die 

14 CPU fault priority logic and form a seven-bit fault code for the highest priority 

15 fault After diis step is performed in die normal way, the CPU fault firmware 

16 compares the seven-bit code to determine if the detected fault is in the group 

17 intended for **intercept^ handling. If it is, bit 7 in SSF Word 5 is set **on" 

18 indicating that an intercept request is nested with this fault The Intercept 
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1 Process is not introduced into hardware and therefore does not have any eEfect 

2 on existing fault priorities. 

3 As shown in HG. 6, when die bitercept Flag is set on in die current SSF 

4 Word 5, thf^ die operating system will transfer to die first locadon of die 

5 Intercept Process code using a descriptor established during system 

6 inidalizadon. 

7 The code widiin die Intercept Process does not generate faults as die 

8 operating system fault module in die example is not able to handle a second 

9 fault (there is no hardware enforcement of diis rule). In a multiprocessor 

10 system, die Intercept Process returns to the same CPU (convendonally 

11 idendfled by a System Identification Number or die equivalent) that initiated 

12 die request for service. This measure reduces die overhead of managing 

13 interrupts and faults within the Intercept Process. 

14 Two special purpose instractions are used with the Intercept Process. 

15 These two instructions are used to provide ^^n^in processor mode permissions 

16 while executing within Intercut Process. The two special purpose instructions 

17 are SICPM (Set Intercut Mode) and RICPM (Reset Intercept Mode). The 

18 SICPM instraction is one of the first instroctions executed by Intercept Process 

19 after control has been transferred to Intercept Process to ensure tiiat the correct 
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1 CPU mode and pennissioiis are set before any odier instructions are e xecu t ed. 

2 The RICPM instruction is one of the last instructions executed by tiie Ihtercq)t 

3 Process each ^ jv^ it is f^n^^ to recover a fault This instruction resets tiie 

4 processor mode and permissions to the state tfiat tl)ey were in when the 

5 SICPM instmction was executed 

6 The Intercept Process enters at location zero of the first page that was 

7 defined to the operating system by the Service Processor when the system is 

8 initialized This common entry point is the only entry point into the Intercept 

9 Process. The Intercept Process needs to identify itself to the system when it is 

10 in execution so that the hardware and firmware can automatically recova for 
U errors encountered due to system mode and permissions. The Intercept 

12 Process identifies itself to die hardware by die execution of the previously 

13 discussed SICPM instmction. The execution of tiiis instruction causes die 

14 hardware to set an internal flag indicating that the Intercept Process is 

15 tenq}oraiily in control of the CPU. A housekeeping routine saves die 

16 processor's pro gram visible registers. Li the exanq)le, the reg^ters are saved 

17 by physical processor using X Register 7. This insures diat the Intercept 

18 Process will have coixq)lete use of all program visible regist^ and still be able 
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1 to return to the operatzng system fault module with the te^steis ia the state and 

2 with the content as when they wece called. 

3 The first step of processing an Iatercq>txeque^ 

4 of &ult that caused die request This is adueved by getting die £sdU type firo 

5 the cuirent SSF. DR3 is required to fi:ame die cuxzentSSF. The seven-bit fmlt 

6 type generated by die CPU firmware is resident in bits 11 - 17 of Word 5 in the 

7 SSF. This value is used as an index into a processing table. 

8 The next level of the Ihtercqpt Process execudon routine is fetdied from 

9 a functionality table. If the entry in this table for die fault type in not negative 

10 (bit zero - ''O'*), dien the entcy has the offset widun the Intercept Process 

11 where the present fault type is to be processed. Qf die entry in this table for 

12 the &ult type is negative (bit zero = T*), then diere is no next level process 

13 routine, and control will be transferred to process an Service Processor JJO 

14 request This feature is outside die present invention.) 

15 In this manner* control is passed to the execution routiae within 

16 Intercept Process for handling the specific fault type. It is ai this level diat the 

17 rules about die fault are applied to determine if the current fault qualifies for 

18 recovery. Tbe roles, of course, vary depencfing upon the fiuilt type and 

19 hardware problem(s) that caused the error conditions. When Intercept Process 
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1 support is required for a certain fault type* die fuilt handling iuncdon is 

2 sqjpropnately defined. 

3 The Intercept Process has two exits, and bodi cause the operating 

4 system to continue processing, but widi different bgic paths. A first exit from 

5 die Intercept Process causes the operating system to retry the fuled insttucdon 

6 after die fault has been handled by Intercept Process. A second exit from die 

7 hitercept Process causes the operating system to continue its processing of die 

8 fault and to resolve it if possible. (Although not shown in FIG. 6, it is also 

9 possible that the operating system fault module will refer the fault to die 

10 Service Processor which might freeze die fioilting CPU and reconfigure the 

1 1 system as necessary.) 

12 Regardless of the exit point* the program visible registers must be 

13 restored before returning to the operating system's fault module. The Intercept 

14 flag (bit 7 in Word S in the SSF) will be reset as part of the wrap up processing 

15 in the Intercept Process. 

16 As previously noted, the return from the hitucept Process to die 

17 operating system fault module is a lateral transfer using Pointer and Descriptor 

18 Register Zero. This type of transfer inust be done since this is the same type of 

19 transfer used to enter die fiitercept Process. 
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1 The foregoing desaiption describes tiie inveation in the environment of 

2 a multiprocessor computer system; however, it win be iqppredated by those 

3 skilled in the art tiiat the invention may be used with equal effea in a 

4 uniprocessor system which includes iterative execution instructions. 

5 Thus, while the principles of the invention have now been made dear in 

6 an illustrative embodiment, there will be immediately obvious to those skilled 

7 in tfie art many modifications of stracture, arrangements, proportions, the 

8 elements, materials, and coixq}onents, used in the practice of the invention 

9 which are particularly adapted for specific environments and operating 
10 requirements without departing firom those principles. 
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1 L In a computer system iiinctiomiig under an operating system including a 

2 fault handling nuy hile, a process for handling a central processor fiuilt 

3 comprising the steps of: 



4 A) when a central processor fault occurs during an operation: 

5 1) setting an intercept flag if the fault is to be directed to a 

6 preprocessor; 

7 2) establishing a safestore ficame which includes information 

8 identifying the type of fault and whether the intercept flag is set; 

9 and 

10 3) transfening control to the operating system fault handling 

11 module; 

12 B) in the operating system fault handling module, detomining whether 

13 the intercept flag is set; 

14 C) if the intercept flag is not set, handling the faaUt in the operating 

15 system fault module and going to step F); 

16 D) if the intercept flag is set, transfening control fmm the operating 

17 system fault module to an intercept process written in machine language; 
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18 E)haiuiling the £iult in die intercept process; aad 

19 F) retrying die operation wUch caused the £uik in tte 

1 2. The process of Claim 1 in whicii modifiable central processor firmware is 

2 employed to sense die central processor fsoilt and to selectively set die 

3 intercept flag while establishing the safestore frame. 

1 3. The process of Claim 2 in which die central processor firmware is 

2 configured to recognize a central processor fault wliich is due to a known 

3 hardware design error. 

1 4. Theprocessof Claim 3 in whicii, after the known hardware design error has 

2 been corrected, the central processor firmware is reconfigured to eliminate 

3 monitoring for faults due to the known hardware design error. 

1 5. The process of Claim 4 in which the current version of the central processor 

2 finnwareanddiecurxentversionof the intercq)t process are each niatdied to a 

3 central processor hardware release cuirendy in use for a given central 

4 processor. 
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1 6. The process ofClaim 2 in vdiich the intetx^ 

2 finnwaze ate loaded upon initialization of the system in which the central 

3 processor is resident 

1 7. The process of Claim 3 in which die intercept process and central processor 

2 firmware are loaded upon initialization of die system in which the central 

3 processor is resident 

1 8. The process of Claim 4 in which the intercept process and central processor 

2 firmware are loaded upon initialization of the system in which the central 

3 processor is resident 

1 9. Theprocessof Claim 5 in which the intercept process and central, processor 

2 firmware are loaded upon initialization of the system in which the central 

3 processor is resident 
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